Alternative solutions to diluted p-spin models and XORSAT problems 
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We derive analytical solutions for p-spin models with finite connectivity at zero temperature. 
These models are the statistical mechanics equivalent of p-XORSAT problems in theoretical com- 
puter science. We give a full characterization of the phase diagram: location of the phase transitions 
(static and dynamic) , together with a description of the clustering phenomenon taking place in con- 
figurational space. We use two alternative methods: the cavity approach and a rigorous derivation. 
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I. INTRODUCTION 

The very last years have seen a growth of interest in 
disordered models defined on Bethe-lattices-like topolo- 
gies, that is finite connectivity random graphs (see e.g. 
Ref. 0]). Appropriate generalizations of mean-field the- 
ory are exact on such structures allowing for an exact 
solution of spin-glass like models. The presence of large 
loops may induce frustration leading to highly non triv- 
ial properties at low enough temperatures. Interacting 
models defined over finite connectivity graphs provide a 
better approximation to finite-dimensional models than 
fully connected mean-field models, allowing for qualita- 
tively new effects to be discussed. At zero temperature, 
spin glass like models over random graphs correspond 
to some random combinatorial optimization problems of 
central relevance in theoretical computer science 0] . 

Quite in general, spin glass models show an interest- 
ing phase diagram in the (7, T) plane (see e.g. Fig. 2 in 
||), where 7 is a parameter proportional to the mean 
connectivity of the underlying random graph and T is 
the temperature. The frozen phase is located at high 7 
and/ or low T. 

Open questions are, for example, the exact location 
of the critical lines (dynamic and static ones), the full 
characterization of the configurational space in the frozen 
phase (e.g. ground state energy and threshold energies), 
etc... 

Here we focus on the simplest non trivial model that 
can be defined on a random graph with finite mean con- 
nectivity, namely the p-spin model. We concentrate on 
the zero temperature limit, which corresponds to the p- 
XORSAT problem in theoretical computer science B. 
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In this limit the model undergoes two relevant phase 
transitions ||. The first one takes place at 7^ (for p = 3, 
jd = 0.818469) and corresponds to a clustering phe- 
nomenon: For 7 < 7(2 all the ground states (GS) form 
a unique cluster, while for 7 > 7,2 they split into an 
exponentially large (in N) number of clusters, each one 
containing an exponential number of GS. This clustering 
phenomenon coincides with the formation, in the configu- 
rational space, of barriers (clusters are well defined only 
because of the presence of barriers) and of metastable 
states, which make any greedy search algorithm ineffi- 
cient. This is why it is usually called dynamic transition 
0. The second phase transition takes place at j c > 7^ 
(for p = 3, 7 C = 0.917935) and marks the SAT/UNSAT 
transition, that is the point where frustration becomes 
manifest in the system and the GS energy becomes larger 
than zero. 

We will derive the above scenario via two distinct and 
complementary methods. The first one is the cavity 
method. Its power relies in its generality, since it can be 
applied easily to more complex systems too, e.g. random 
k-SAT [||. Within this method the above scenario 
can be obtained using an Ansatz with a single step of 
replica symmetry breaking (1RSB). The second method 
is a rigorous derivation based on the 'leaf removal' algo- 
rithm which is able to reduce the random (hyper)graph 
to its relevant core. On the core, any interesting quantity 
(e.g. the number of GS, cluster size and distance) can be 
easily calculated, since annealed averages coincide with 
quenched ones. 

This rigorous derivation is of great importance also be- 
cause this is one of the few cases || where a highly non 
trivial scenari o, p reviously obtained with a replica calcu- 
lation [l^, [llj , can be confirmed with rigorous meth- 
ods. These results confirm the validity of the cavity ap- 
proach, and may open the way towards the construction 
of mathematical bases for the Parisi's replica symmetry 
breaking theory p2[. 
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II. DEFINITION OF THE MODEL 

The random p-XORSAT problem consists in finding 
an assignment to N boolean variables xi € {0,1}, such 
that a set of M = jN parity checks on these variables 
are satisfied. Each parity check is of the kind 

Xif + . . . + Xim = y m mod 2, m=l,...,M (1) 

where, for each to, the p indices ij*, {1, ... , N} 

are chosen randomly and uniformly among the (^) possi- 
ble p-uples of distinct indices, and the 'coupling' y m takes 
randomly value or 1 with equal probability. The above 
set of constraints can be written in a more compact way 
as Ax = y mod 2, where A is a M x N random sparse 
matrix with exactly p ones per row and y is a random 
vector of 0s and Is. 

Once the mapping a = (— l) x and J = (— l) y is per- 
formed, the XORSAT problem can be also studied as the 
zero-temperature limit of the following p-spin Hamilto- 
nian giving the energy for a configuration of N Ising spins 

trt e {-1,1}: 

M 

n=J2( 1 - J m^r---^) ■ ( 2 ) 

m— 1 

Unfrustrated ground states (GS) configurations have 
zero energy and correspond to solutions of the XOR- 
SAT problem, since they satisfy all the constraints: 

VTO , <7jm . . . (Tin, = J m . 



III. T = PHASE DIAGRAM FROM THE 
ONE-STEP CAVITY METHOD 




FIG. 1: A pictorial view of the cavity iteration: h\ and /12 
cavity fields are the sum of some cavity biases u, and in turn 
they generate a new cavity bias u according to Eq. (^). 

associated to the hyperedges. The cavity field is the ef- 
fective field on a variable once one of its interactions has 
been removed. Under a cavity iteration, cavity biases 
generate cavity fields and vice versa (see Fig. Q). The 
cavity field h is always the sum of the cavity biases u 
coming from all its interactions, but the one removed. 
The rule for generating u biases from h fields is in gen- 
eral more complex. 

For T = the formalism simplifies a lot jb|: Cavity 
fields and cavity biases only take integer values and the 
cavity equations can be derived easily by implementing 
the energy minimization condition under the cavity iter- 
ation. Let us imagine to add a hyperedge connecting 3 
spins, say spins ao, o\ and a 2 among which spin ctq plays 
the role of cavity spin. We need to perform a partial 
minimization of the effective energy 



In this section we shall display the analysis of the phase 
diagram of the p-spin problem as it arises from the one- 
step cavity approach. We consider the cavity formal- 
ism directly at zero temperature as discussed by Mezard 
and Parisi jl|, [l3| and developed further in We re- 
fer to those papers for a review of the method and the 
notations. Here we shall limit ourselves to the techni- 
cal aspects of the analytical calculation for p — 3 case, 
generalizations to p > 3 being straightforward. 

The zero temperature p-spin model can be viewed as a 
relatively simple limit case of more general problems such 
as random k-SAT for which the cavity calculations have 
also been carried out recently ||. The main technical 
difference between random k-SAT like problems and the 
p-spin model consists in the fact that the site dependence 
of the functional order parameter simplifies dramatically 
in the p-spin problem below the static transition. This 
allows for a rigorous derivation of the cavity and replica 
results by alternative methods, as we shall thoroughly 
discuss in the subsequent sections. 

In the cavity formalism one works with "cavity 
fields" hi associated to the sites and "cavity biases" uj 



min [e(a ,a 1: a 2 ) - {hi<J\ + fao^)] = 

= -wj(hi,h 2 ) - uj(hi,h2) a , (3) 

where e(<7o, a\, 02) = 1 — J<jq<ji<J2- The above relation 
defines the cavity biases w and u as functions of the "in- 
put" cavity fields h. After a little algebra one finds 

wj(hi,h 2 ) = I hi I + \h 2 \ - \ujQit,h2)\ , 
uj{h u h 2 ) = S(Jhth 2 ) , (4) 

where the function S(x) is defined as 

s{x) = I sign(,) if ^ , (5) 

The free-energy of the system can be expressed either in 
terms of probability distributions of the cavity fields or 
of the cavity biases [[ij p[ p^[. 

In a one step scenario the phase space breaks into 
many pure states and the order parameter of the model 
is a complete histogram, over the system, of probabil- 
ity distribution functions of fields, "P[P(ft.)], and biases, 
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Q [Q (it)] . Such a rich structure of the order parameter can 
be understood by noticing that each spin may fluctuate 
from state to state and therefore the whole collection of 
single site probability distributions might be needed to 
capture such fluctuations. In the simple case of a single 
pure state, the so called replica symmetric (RS) phase, 



J 



single site probability distributions becomes delta func- 
tions and the order parameter simplifies to a single global 
probability distribution. 

Following the general scheme discussed in Refs. jl], 0, 
[k| , but with a more convenient normalization for the 
Q(u), the self-consistency equation for the Q[Q(u)) reads 



Q(u) 
P^ k \h) 



A,, 



Ej / dhP^Xh) dgP {k '\g) 5(u-uj(h,g) 



duiQi(ui) . . . du k Q k (u k ) exp 



i=l 

k 



\i=l 



with prob. e 



-3t 



(37) fc p -3 7 Ml 



J duiQilui) . . . du k Qk(uk) 5 ( h - E ui j exp 



k\ k'\ 

I k k 

y E N ~ IE' 1 *' 



i=l 



(6) 



-y EN-iE' 



where all the Qi(u) on the r.h.s. are chosen randomly 
from the distribution Q[Q(u)]. The average Ej over the 
coupling signs J = ±1 forces all the distribution to be 
symmetric under u <-> — u or h <-> —h. The parameter y is 
the so called reweighting coefficient [y — 0m where m is 
the Parisi breaking parameter) which takes into account 
level crossing of states under the cavity iterations p3| . 
The parameter y must be chosen such as to maximize 
the free energy. 

As the cavity biases take values in {0, ±1}, and thanks 
to above mentioned symmetry, each Qi(u) can be writ- 
ten, in full generality, as 

Qi(u) = m S(u) + [S(u + 1) + S(u - 1)] . (7) 



Plugging the above form into Eqs. (^|), one finds for 
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9=0 

= e - 3 Td-<=o) j Q [3 7 (! 



co J 



(9) 



where [x\ is the integer part of x. However, the above 
equation leads to wrong predictions: a solution different 
from the trivial paramagnetic one, Qj{u) = S(u), appears 
at 7 = 1.16682 with a negative energy. At jrs = 1.29531 
the energy becomes positive, giving a lower bound for the 
true energy of the system. 



Thus the self-consistency equation for Q[Q(m)] can be 
rewritten as a self-consistency equation for the probabil- 
ity distribution of r)i, p(rf). 

Eventually, given the whole set of stationary {Qi(u)} 
or the stationary p(r)), the average ground state energy 
and the complexity can be deduced from the formulae of 
Refs. § 0, I 0. 



A. Solution of the self-consistency equation 

1. RS solution 

We first notice that it is always possible to get back 
the simple replica symmetric solution by fixing y — and 
assuming that the cavity biases are "certain", Qj(u) — 
5(u — Uj), where the Uj are independent and identically 
distributed random variables taken from a distribution 



Q( U ) = c <y(«) 



(1 - co) 



2. 1RSB solution and the existence of non trivial fields 

The numerical solution of Eq. (||) indicates that there 
exits a non-trivial solution in the region 7 > 0.82 for suf- 
ficiently large values of the reweighting y. A careful look 
at the numerics shows that the probability distributions 
of rji takes the form 



p(r,)=tS(r,-l) + (l-t) p( v ) 



(10) 



[S(u - 1) + S(u + 1)] . (8) 



that is a fraction t of cavity biases are trivial. The non 
trivial cavity biases are characterized by a distribution 
p which shrinks in the limit of large y 7 converging to 
delta function in rj — 0. The y — > 00 limit is particularly 
relevant in the region up to j c . 



3. The y — > 00 limit: the complexity and the location of the 
phase transition 

Looking at the self-consistency equations (^J), the only 
way one can obtain a non-trivial distribution Q(u) on the 
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l.h.s. is when both P^(h) ^ 6(h) and P {k 'Hg) ^ 5(g). 
Moreover the probability that P( k '(h) — 6(h) equals the 
probability of picking up k trivial distributions Q(u), i.e. 
t k . Putting everything in formulae, one has 



k,k'=0 



(37) fc (37) fc ' ( 1 _, fc)(1 _^ ) 
k\ k'\ [ )( > 



1 - 1 = ( 1 - e -37(i-*) 



(11) 



For 7 < 7^ = 0.818469 the only solution is t = 1 (the 
system is a paramagnet) whereas above 7^ a non-trivial 
solution appears. 

For y = 00, a direct inspection of the numerical results 
shows that the cavity biases spontaneously divide in two 
categories, such that p{q) — tS(r) — 1) + (1 — t) S(rj). In 
terms of Q[Q(u)] it corresponds to having 



Q( 



u 



{5(u) with prob. t 

with prob. 1-t 



(12) 



which indeed is a fixed point under the iteration process 
(^J) for y = 00, provided the fraction of trivial biases t 
satisfies Eq. ([ll]). 

Using the expressions of Refs. jl], ^, ||, ^3| , for very 
large y, the free energy can be written as $(y) = ^. As 
expected, one finds that, as long as ip < the maxi- 
mum of &(y) is located in y — 00 and corresponds to a 
zero ground state energy. Consequently the complexity 
or configurational entropy of zero-energy states, i.e. the 
normalised logarithm of the number of solutions clusters 
is given by 



-V- = log(2) 



A 
3 



(13) 



where A = 37(1 
equation 



t) and satisfies the self-consistency 



A = 3 7 (l-e- A )' 



(14) 



The critical point, i.e. the SAT/UNSAT threshold, j c = 
0.917935 can be found as the 7 value where the complex- 
ity becomes zero (see Fig. ||) . For 7 > j c the free energy 
$(y) has a positive maximum in a finite value of y, which 
corresponds to a positive ground state energy. 



4- Expansion at large y: the ground state energy in the 
UNS AT phase (MAX-3-XORSAT) 

In order to study the ground state energy for 7 > 7 C 
we need to take care of the leading corrections in the 
limit y ^> 1. For finite y, the distribution in Eq. ( p^ ) 
is no longer stable and we need to study a more general 
distribution of biases which takes care of the appearance 
of a non-trivial contribution to the peak in u — arising 
from frustrated interactions. This more general Q[Q(it)] 
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FIG. 2: The complexity as a function of 7. 



is such that a fraction t of messages is still completely 
trivial, Q(u) = 5(u), while non-trivial messages comes 
from distributions of the following kind 



Q{u) = te~ 2y 5{u) + 



1 - &~ 2y 



[<y(u-i)+<y(u+i)] . (15) 



The factor e~ 2y has been introduced in order to have 
finite £ in the limit of very large y. Moreover, from nu- 
merical solution of Eq. (^) we observe that £ takes only 
integers values. Let us call a m the fraction of non-trivial 
distributions having £ = m. The generating function 
a(z) = ^ a m z m satisfies the equation 



a(z) = [A a(z) + Bz+l-A~ B] , 



(16) 



where A 



e — A 



and B 



l_ e -A " l-e- A • 

Using the distributions in Eq. ( |l5|) one can obtain the 
free energy density $(y) up to the first correction 



where 



A 

uj = — 
3 



V V 



l-e- A (l + ^A + A 2 



(17) 



<£}(] 



V (1 + 2A) 



(18) 



The mean value of £ can be easily obtained deriving 
Eq. (|l6|) with respect to z and then putting z = 1, 



(0 =«'(!) 



and thus we have 



2B 



1-2A l-e- A (l + 2A) 



1 - e 



1 + -A 



(19) 



(20) 
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Summarizing the statistical mechanics analysis, we 
have that for any 7 > 7^ = 0.818469, one can solve 
Eq. flli ) for A, deduce the large y behaviour of &(y) from 
Eq. (|17) and maximize $>(y) with respect to y. We find 
a critical value of 7, 7 C = 0.917935, where ip changes 
sign. For 7 < j c , ip < and therefore the maximum 
of $(y) is found at y = 00. The distribution of cavity 
biases is given by Eq. ([l^), and the maximum value of 
$ is 0, showing that all hyperedges are satisfied (apart 
from maybe a vanishing fraction at large N). 

At 7 = 7c, t/j changes sign and, for 7 > 7 C , <&(?/) has a 
maximum at a finite y, which shows that the ground state 
energy becomes strictly positive: It is no longer possible 
to satisfy simultaneously all the constraints. 

The value of the energy for 7 slightly above 7 C can be 
computed from the large y expansion. Moreover, such 
an expansion allows us to compute the complexity S(i?) 
of states of given energy E by a Legendre transforma- 
tion of the free energy. The complexity function T>(E) is 
obtained by solving E — d y (y$) and X = y 2 d y §. 



From Eq. (|T^) for $ we get 

E = 2uje- 2y + 0(e- 4y ) , 
£ = —ijj + (2y + l)uoe~ 2y - 



(21) 
(22) 



For 7d < 7 < 7c, the constant ip is negative and one finds 
a complexity curve which starts positive at E = 



E(i5) 



E 
2" 



log 



E 
2~o 



- 1 



(23) 



In particular, the number of lowest lying states, which 
have an energy E = 0, scales with the number of N of 
spins as exp(— Nip). 

For 7 > 7c, the expression ( p3| ) for the complexity still 
holds, but ip is positive. The regime of energies close to 
where E(-E') is negative corresponds to a region where 
the average number of states is exponentially small in N. 
Therefore there are no states in this region in the typical 
sample. States appear above the ground state energy Eq 
which is the point where £(-E) vanishes, and corresponds 
to the maximum of $(y). In Fig. || we show the analytic 
prediction for the ground state energy Eq (lowest curve) 
together with numerical results from exact optimization 
on small systems. Numerical data are compatible with 
the analytic solution, which has been obtained expanding 
around the critical point. 



IV. RIGOROUS DERIVATION OF 
THRESHOLDS AND CLUSTERING 

We now show how the results of the previous Section 
can be rederived in a rigorous way. We will exploit con- 
cepts from graph theory and all the calculations will be 
simple annealed averages, which are rigorous. All the 
formulas will be written for generic p, and the particular 
case p = 3 will be considered in order to make connection 
with calculations in the Section [II. 
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FIG. 3: Ground state energy for 7 values above the critical 
point 7c = 0.917935. Numerical data seem to converge to the 
analytic prediction. Finite-size corrections roughly decrease 
as 1/N. 



The physical idea behind the graph theoretical deriva- 
tion is the following. In a random hypergraphs there are 
many variables with connectivities and 1, whose cav- 
ity fields are null. A small fluctuation in the number of 
these variables, induce very large fluctuations in physical 
observables, like e.g. in the entropy. Thus the idea is to 
remove all these spins and to study the properties of the 
residual hypergraph, the core. We find that, on the core, 
sample-to-sample fluctuations are negligible and this al- 
low us to study its properties by mean of very simple 
annealed averages. 

The plan of this section is the following: (A) defini- 
tion of some graph theoretical concepts, like random hy- 
pergraph and hyperloop; (B) introduction of the 'leaf 
removal' algorithm and solution to its dynamics (esti- 
mation of the jd threshold); (C) statistical description 
of the hypergraph core (the part left by the applica- 
tion of leaf removal algorithm); (D) calculation of 7 C , the 
SAT/UNSAT threshold; (E) derivation of GS clustering 
properties. 



A. Random hypergraphs and hyperloops 

In the Hamiltonian (j^) disorder enters in 2 ways: in 
the sign of the couplings J m = ±1 and in the M random 
p-uples of indices {i™, . . . , i™} m =i....,M , which define the 
interactions topology. This topology has finite connectiv- 
ity (each variable appears on average in pr/ interactions) 
and locally tree- like (an Husimi tree for p > 2). 

This topology can be represented as a hypergraph Q 
made of a set of N vertices (corresponding to the vari- 
ables in the problem) and a set of M hyperedges (cor- 
responding to the constraints in the problem), each one 
connecting p vertices. The disorder ensemble thus corre- 
sponds to all the possible ways one can place M = 7iV 
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hyperedges among N vertices, each hyperedge connecting 
p vertices and carrying a random sign J m = ±1. 

Analogously to what happens with loops in usual 
graphs (p = 2), in a disordered model defined on a hy- 
pergraph (p > 2) frustration is induced by the presence 
of hyper loops Js[ 10f1 , which are also called hypercycles in 
the literature [14]. The definition of a hyperloop can be 
given both in terms of the hypergraph Q or in terms of 
the matrix A. 

A hyperloop is a sub-hypergraph C C G, i.e. a set of 
hyperedges belonging to Q, such that every vertex has 
even degree (connectivity) in C. 

In terms of the matrix A it corresponds to a set of rows 
1Z such that, for every column, the sum modulo 2 of the 
elements is zero, i.e. X^g-r ^-y mod 2 = Vj. 

The presence of hyperloops is directly related to the 
presence of frustration in the system: If the prod- 
uct of the signs of hyperloop interactions is negative, 
Ilmec ^ m = — 1' then not all such interactions can be 
satisfied at the same time. The critical point j c , where 
hyperloops percolate, is a T — phase boundary for 
the p-spin glass models defined by Hamiltonian (||): For 
7 < 7 C all the interactions can be satisfied and the GS 
energy is zero, while for 7 > 7 C the system is in a frus- 
trated spin glass phase and GS of zero energy no longer 
exist. 

The critical point 7 C corresponds to the SAT/UNSAT 
threshold for the random p-XORSAT problem. In terms 
of the random linear system Ax — y mod 2, as long as 
7 < 7c solutions to the system will exist with probability 
1 in the large N limit for any y. 



B. 'Leaf removal' algorithm 

Given a hypergraph the leaf removal algorithm pro- 
ceeds as follows p5[ : As long as there is a vertex of de- 
gree 1 remove its unique hyperedge. A single step of the 
algorithm is illustrated in Fig. [I| for a graph (p = 2) and 
for a hypergraph (p = 3). Very similar algorithms have 
been recently studied in ]^6[ 




p = 2 




p = 3 



FIG. 4: A single step of the 'leaf removal' algorithm on a 
graph (top) and on a hypergraph (bottom). 

During the whole process the remaining hypergraph is 
still a random one, since no correlation can arise among 



the hyperedges if it was not present at the beginning. 
When there are no more vertices of degree 1 in the hy- 
pergraph the process stops and we call core the resulting 
hypergraph, cleared of all isolated vertices. 

The leaf removal algorithm is not able to break up 
any hyperloop, since each vertex in the hyperloop has at 
least degree 2. The 7 value where the core size becomes 
different from zero, let us call it 7^, is certainly smaller 
than the percolation point of hyperloops j c (for p = 2 
these two values coincide). 

The evolution of a hypergraph under the application 
of the leaf removal algorithm can be described in terms 
of the probability, fk(t), of finding a vertex of degree 
k after having removed tN hyperedges where the 'time' 
t ranges from to 7. The initial condition is /fc(0) = 
e -P7 (P7) anc j ^.jjg evo l u tion equations read (see Ref. (TtJ 
for a detailed derivation of similar equations) 



dfo(t) 
dt 

dt 
df k (t) 
dt 



= (P-1) 



hit) 
m(t) 



(p-l)M^-l 



m(t) 

(k + l)f k+ i(t)-kf k (t) 
m(t) 



(24) 



Vfe > 2 



where m(t) = ^ fe kfk(t) = p(7 — t), since the mean de- 
gree linearly decreases with time (we remove one inter- 
action per step) and vanishes at t = 7. 

Thanks to the simplicity of the leaf removal process, 
the degree distribution always remains Poissonian for de- 
grees larger than 1, with a time dependent average A(t), 



hit) 



-A(t) 



k\ 



Vfc > 2 



The solution to Eqs. ( |24| ) reads 

A(t) - p[i{i-ty- x Y 

hit) = \{t) e- A W-l- 

OO 

ut) = i-x>(*) • 



A(i)\ — 
PI ) 



(25) 

(26) 
(27) 

(28) 



The leaf removal algorithm stops when there are no more 
vertices of degree 1, so one can predict the resulting core 
by fixing X(t) — A*, where A* is the largest zero of the 
equation f\ = or equivalently 



— A' 



-1+1^=0 



(29) 



More precisely A* is the first zero of Eq. ( p7| ) one finds 
decreasing A, starting from the initial value of A(0) = P7, 
but this always coincides with the largest zero. Note 
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that once we define m 
be rewritten as 



[AV(p 7 )]l/(P-l), E q. (§) 



exp(- 



(30) 



which is nothing but the equation for the magnetization 
in the ferromagnetic state equivalently the equation 
for the backbone size in any cluster. Note that Eq. (|29| ) 
with p = 3 is identical to Eq. (|l4|), which indeed deter- 
mines the mean connectivity of the sub-hypergraph made 
of hyperedges with non-trivial biases. 



tree-like graph 




For p > 2 the percolation transition, taking place at 



7 P = 



does not affect at all the leaf removal al- 



p(p-i) ' 

gorithm which is able to delete all the hyperedges, even 
those forming loops (but not those forming hyperloops!), 
far beyond 7 P (see Fig. |?]). 




FIG. 7: For p > 2 the leaf removal algorithm is able to break 
loops (but not hyperloops!). 




core 




graph with loops 



FIG. 5: On graphs (p — 2) the leaf removal algorithm is not 
able to break loops, which thus remain in the residual core. 
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FIG. 6: The function /i(A)/A for p = 2. 

In the p = 2 case the leaf removal algorithm is able 
to delete all the edges only for tree-like graphs. As soon 
as there are loops in the graph, a core containing these 
loops arises (see Fig. ||). In a random graph the leaf 
removal transition coincides with the percolation one at 
7 p = 1/2. The shape of the function /i(A) is shown in 
Fig. U For 

7 < 7p, there is only one zero in A* = 0; 
While, for 7 > 7-, A* > and a core arises, whose size 
grows as (7 — j p ) 2 near the critical point. 



3" 




FIG. 8: The function /i(A)/A for p - 
for p — 3. 



3. Inset: function A* (7) 



The leaf removal transition takes place at 7^, which is 
defined as the first 7 value where a second solution to 
Eq. © appears. For p = 3 we have j d = 0.818469. The 
transition is first order and, at the critical point, the core 
already occupies a finite fraction of the system. In Fig. || 
we show the function /i(A) for p = 3. It is clear (see 
inset of Fig. ||) that when A* (7) becomes different from 
zero it directly jumps to a finite value: A*(7d) = 1.25643 
for p = 3. 



C. Statistical description of the core 

Once the leaf removal process has come to an end the 
distribution of connectivities on the core is a truncated 
Poissonian 



P c {k) 







-A*(7) A-(7)* 



for k = 0, 1 
for k > 2 



(31) 
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The number of vertices N c and the number of hyperedges as 
M c in the core can be expressed in terms of p, 7 and A* (7) 



Nch) = Nj2f k (\*)=N\l-(l + X 



k=2 



M c (7) = M — Nt* = N 



1 /A* x " 
7 V P 



7 iv(l-e-^) P = iV^(l-e-^ 



(32) 
(33) 



The first of these equations has a simple interpretation: 
The number of vertices in the core is nothing but the 
number of vertices with a degree larger than 1, after the 
application of the leaf removal algorithm. The second 
equation states that the number of hyperedges left is the 
initial one minus the number of step the leaf removal al- 
gorithm has been run (during each step only one hyper- 
edge is deleted). The running time t* is the solution to 
Eq. (^6|) with A* on the left hand side. The last two, and 
more compact, expressions for M c have been obtained 
with the use of Eq. j29|). The lower curves in Fig. ^| show 
the normalized number of vertices N c /N and number of 
interactions M c /N in the core as a function of 7, for 
p = 3. 
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FIG. 9: From bottom to top (on the left): For p = 3, nor- 
malized number of hyperedges and vertices in the core, and 
fraction of frozen sites, i.e. magnetization (or backbone) in a 
state. 



show that, from a solution in the core, a solution for the 
original system can always be constructed. 



D. Calculation of the 7 C threshold 

Let us call Nj,n,M the number of GS for a given disor- 
der realization J (i.e. a given hypergraph and coupling 
signs) with N variables and M interactions. We will 
show that, in the large N limit, if the hypergraph does 
not contain any vertex of degree less than 2, Mj n m 
is a self averaging quantity, that is it does not fluctuate 
changing J. 

In order to show self-averageness we will prove that, 
on hypergraphs (p > 2) with minimum degree at least 2, 
the following equalities hold 



J,N,M 



v/V-M 



lim 



Kf 2 



J,N,M 



J,N,M 



(34) 

where the overline stands for the average over the disor- 
der ensemble, that is over the ways of choosing M hy- 
peredges among and the ways of giving them a sign 
J m = ±1. The above equalities state that the probabil- 
ity distribution of Mj.n.m over the disorder ensemble is 
a delta function, and thus the quenched average equals 
the annealed one 



= \ogN j, n ,m = log(2) (N - M) 
Given the definition 



(35) 



It is natural now to study the residual problem on the 
core, A c x c = y c mod 2, where A c is the M c x N c sparse 
random matrix obtained from A deleting all the rows 
corresponding to removed interactions and all empty 
columns. In the next subsection we will derive a general 
result that, when applied to the problem on the core, 
gives a necessary and sufficient condition for the exis- 
tence of solutions to A c x c = y c mod 2. Then we will 



M 



the first moment is trivially given by 



(36) 



AT 



J,N,M 



^2 n s ( jm = ■■■ <ji . 



\N—M 



(37) 
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since, for every given spin configuration and topology, 
the probability that coupling signs satisfy all the M in- 



J 



teractions is exactly 2 M . 

The second moment is given by 



TV 2 

JV J,N,M 



M 



E II 5 ^ i T ■ ■ ■ ai P = J ™) s ( a l? ■ ■ ■ °c = J " 



M 



M 



e n ^ = ■ ■ ■ e n *wr • • ■ °e = ^ ■ • ■ - 2Jv_m e n ^ ■ ■ ■ ^ = d 



(38) 



where r, = <7i<7^ and the last expression is nothing but the annealed average of the partition function at T — for a 
system where all the coupling signs have been set to 1, i.e. a ferromagnetic model. Such an average can be computed 
by standard saddle point integration and the final result is 



1 M 

Fn ' m = iv h >oo n log £ n • • • %™ = i) = e p ( fc ) io § 



+ x k _ 



(39) 



r 



where P(fc) is the distribution of connectivities in the 
hypergraph and X-L. , X solve the following equations 



x + + X- 



x + 



E 

k 

E 



kP(k) x 



k-l 



.k-l 



kP{k) xl' 1 - x 
~{k) S 



Jt-i 



p-i 



ir-i 



(40) 
(41) 



Here (k) = J2k kP{k) = is the mean connectivity. 
When more than one solution to Eqs.( ^0| , ^l| ) exist, the 
one maximizing Eq. ( |39| ) must be chosen. The value of 
x+ (resp. X-) is proportional to the fraction of variables 
taking values 1 (resp. -1) in the set of configurations 
which maximize the sum in Eq. (^9|). Then the typical 
magnetization of this model is given by m — . 

Solutions to Eqs.©fl|) can be classified depending on 
the value of magnetization to. In full generality there are 
3 solutions: a first symmetric one (x+ — with to = 0, 
a second one with large magnetization and a third one 
with an intermediate value of to. For some choices of 
P(k) (e.g. a Poissonian) solutions with to > may exist 
only for ^ large enough. The solution with intermediate 
magnetization always corresponds to a minimum of Fn,m 
and can be in general neglected. 

The symmetric solution x+ = x~ = 2~ 1 / p always ex- 
ists and gives Fn,m = log(2) (1 — For p > 2 and 
P(0) = P(l) = 0, i.e. for hypergraphs with minimum 
degree 2, the solution with large magnetization also exist 
for any 7 value and has x+ = 1, X- = and Fn,ai = 0. 
As expected, the intermediate solution, when it exists, 
has Fn,ai < 0. 

Then, for p > 2 and P(0) = P(l) = 0, we can con- 
clude that the average in the last term of Eq. (|3|) equals 
e NF N , M _ coe fg c i en t can be easily calculated 



and is exactly 1). Thus, equalities in Eq. (|34| ) hold and 
the number of GS is a self-averaging quantity. 

Since the core generated by the leaf removal algorithm 
has minimum degree 2, we may apply the above result, 
and find that the SAT/UNSAT threshold is given by the 
condition 



N c { lc ) = M c ( 7c ) 



1 - (1 + A c )e- A = = ^ (1 



= ) , (42) 



where A c = A*(7 C ). For 7 < 7 C there are 2 Nc ~ Mc solu- 
tions (i.e. unfrustrated GS) in the core, while for 7 > 7c 
there is none. For p = 3, solution to Eq. (|42| ) gives 
A c = 2.14913 and 7c = 0.917935. 

For any given solution in the core, a solution for the 
whole original system can be easily reconstructed. In- 
deed, we reintroduce in the system the interactions re- 
moved during the leaf removal process, but in a reversed 
order (i.e. the last removed is the first to be reintro- 
duced). At each step, together with one interaction, 
at least one variable is reintroduced in the system (the 
variable having degree 1 when that interaction was re- 
moved) and this variable must be set such as to satisfy 
the interaction. Very often more than one variable per 
step is reintroduced, allowing for multiple and equivalent 
choices. This redundancy is what makes the total num- 
ber of solutions larger than the number of solutions in 
the core (see below). 

In the table below we report the thresholds 7^ and 7 C 
for some p values. 
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p 


Id 


7c 


2 


1/2 


1/2 


3 


0.818469 


0.917935 


4 


0.772278 


0.976770 


5 


0.701780 


0.992438 


6 


0.637080 


0.997380 



E. Clustering of ground states 




NT 

# clusters = e =#> 
• = core solution 



N r -M r 



N-N c 



N c 



non-core variables give 
the intra-cluster entropy 
Sftf> — S Ss> — S 



Let us come back to the problem of clustering solu- 
tions before the SAT/UNSAT threshold (7 < 7c ). In 
this region the system is not frustrated and then a gauge 
transformation setting all coupling signs to 1 can always 
be found: Given an unfrustrated GS a a possible gauge 



transformation is <j[ = Oia® and J' m = J m afm 



1. 



Thanks to this, in the rest of the paper we will consider 
only a ferromagnetic system (J m = 1 Vm), which corre- 
sponds to the linear system Ax = mod 2. 

The solutions to the linear system Ax — mod 2 form 
a group: The sum of 2 solutions is still a solution and 
the null element is the solution x — 0. The symmetry 
group is telling us that if one looks at the configurational 
space sitting on a reference GS, the set of GS will look 
the same, whatever the reference GS is. An immediate 
consequence of this symmetry is that, if GS form clusters, 
these clusters must be all of the same size. 

For 7 < 7c, hyper loops are absent and the total num- 
ber of GS (or solutions) is always given by 2 , i.e. 
their entropy is 5(7) = log(2) (1 — 7). Let us divide the 
N variables in 2 sets: x c represents the N c variables in 
the core, and x nc the N — N c variables in the non-core 
part of the hypergraph, that is variables corresponding to 
vertices remained isolated at the end of the leaf removal 
process. Thus also the entropy can be divided in 2 parts. 
One part is given by the solutions in the core, that is by 
the possible assignments of x c , 



5 C ( 7 ) = log(2) 



N c (j) - M c ( 7 ) 
N 



(43) 



which is non- negative for 7^ < 7 < 7c . The other part is 
given by the possible multiple assignments of x nc during 
the reconstruction process 



Such) = 5(7) - S c (7) 



(44) 



This separation of the entropy in 2 parts is physically 
relevant, and we will show here that it corresponds to 
the proper clustering of the solutions. 

The physical picture we have in mind is sketched in 
Fig. |l(| For 7d < 7 < 7c, the solutions of Ax = y 
mod 2, or equivalently the ground states of (||), spon- 
taneously form clusters. By definition, two solutions 
having a finite Hamming distance d, i.e. d/N — > for 
N — > 00, are in the same cluster, while two solutions in 
different clusters must have an extensive distance, that 
is d/N ~ 0(1) for large N. 



FIG. 10: Schematic picture of the clustering of solutions for 

7d < 7 < 7c- 



In virtue of the property stated at the beginning of 
this subsection, all the clusters have the same size. Their 
number is e" 1 '' 7 ', where £(7) is called complexity or 
configurational entropy. We will show that the number 
of clusters equals the number of solution in the core, that 



S(7) = Sc(l) 



(45) 



The intra-cluster entropy, i.e. the normalized logarithm 
of the cluster size, is then given by the non-core entropy 
Snch) = 5( 7 ) - S e (i) = Sh)- £(7). For p = 3 these 
entropies are shown in Fig. \Tu. 




FIG. 11: Total entropy 5( 7 ) and configurational entropy S( 7 ) 
for p — 3. 

The proof of Eq. (|45|) is given in 2 steps. First we show 
that all the solution assignments of the core variables 
x c are "well separated" , that is the distance among any 
pair of them is extensive. This is what gives rise to the 
clustering, with a number of clusters which is at least as 
large as the number of core solutions (S > S c ). Then we 
show that, for any fixed x C) all possible assignments of 
non-core variables x nc belong to the same cluster, and so 

The first step is accomplished by calculating the prob- 
ability distribution of the distance among any two so- 
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lutions in the core. Thanks to the group property, we 
can restrict the calculation fixing one solution to the null 



J 



vector 0. For simplicity we have performed an annealed 
average 



S(d, 7 ) = Jim^ — log 5^ *(E ai =Nc ~ 2d ) II *(°*5" ■ ■ ■ = 1) > 



(46) 



which gives an upper bound to the exact result. The 
expression for this entropy is given by Eq. (39), where 
xjy comes from the solution of Eq. (J40|), keeping the ratio 
fixed. 



N, 



to 




FIG. 12: Entropy of distances among solutions in the core for 
p = 3 (in the annealed approximation) . 



In Fig. 12 we plot the resulting entropy as a function 
of the distance d, for p — 3 and some values of 7. For 
Id < 7 < 7c the entropy is negative for < d < d m i n (7), 
and so d m i n (7) is a lower bound on the minimum distance 
among any two solutions in the core. This minimum 
distance is shown for p — 3 in Fig. |l3|. 

Then all the e NSa core solutions are well separated, 
and can be represented as the centers of the clusters (see 
Fig. |l^). It remains to be proven that, for any fixed 
x c , the solution assignments of x nc form a single cluster. 
Thus no further clustering is present and the picture of 
Fig. [l^ is correct. 

This last proof is given in the Appendix, and it is based 
on an algorithm which allows one to change the value 
to any variable in x nc by simply adjusting other 0(1) 
variables in x nc . This shows that all the solutions in 
one cluster are connected in the following sense. One 
solution can be reached from any other one by a sequence 
of moves, where each move involves flipping only a finite 
number of spins. 
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y 

FIG. 13: Lower bound for the minimum distance among any 
2 solutions in the core for p = 3. 



V. CONCLUSION AND DISCUSSION 

In this work we have solved, with two alternative meth- 
ods, the p-XORSAT model, which corresponds to the 
zero-temperature limit of the diluted p-spin model. 

Increasing the 7 parameter (number of interactions per 
variable) the model undergoes two phase transitions. At 
7d, solutions to the p-XORSAT problem (i.e. ground 
states for the p-spin model) spontaneously form an ex- 
ponentially large number of clusters, thus giving a fi- 
nite configurational entropy. At 7 C , frustration percolates 
throughout the system, and consequently the number of 
clusters (and solutions) goes to zero, and the ground 
state energy becomes positive. 7 C corresponds to the 
SAT/UNSAT threshold. These exact results perfectly 
agree with previous replica calculations |], [Io|, [llj and 
may suggest new approaches for finding mathematical 
bases to Parisi's theory of spin glasses |12|| . 

The use of the cavity method combined with a rig- 
orous derivation based on the topological properties of 
the interaction hypergraph, allow us to establish some 
interesting links among distributions of cavity fields on 
a given variable and the position of the corresponding 
vertex in the hypergraph. In particular all the variables 
with a non-trivial distribution of cavity fields belong to 
the 'frozen' part of the hypergraph (see Appendix), that 
is to the core and to the part that can be uniquely fixed, 
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once an assignment to core variables has been chosen. 
The 'frozen' part is exactly the backbone of a cluster 
(variables which take the same value for all the solutions 
in the cluster) and its size is given by the largest solu- 
tion to Eq. (J3C|). The rest of the hypergraph, the 'floppy' 
part, only contains paramagnetic variables, that is vari- 
ables always having a null cavity field. 
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APPENDIX A 

In this appendix we show that assignments of non- 
core variables x nc are not clustered. To this end, we 
define an algorithm which allows one to flip any non- 
core variable, by simply adjusting other 0(1) non-core 
variables. With this algorithm one can move through all 
the gnments by doing finite steps, thus proving 

that non-core solutions form a single cluster. 

Let us fix the core variables x c to any solution, and 
call them 'frozen'. All the variables, belonging to at least 
one equation where the other p — 1 variables are already 
frozen, must be frozen too (see e.g. the dashed triangle 
in Fig. [jr], where the dashed blobs represent the frozen 
core) . In this way one is able to freeze a number of vari- 
ables m(7)iV, where 777(7) turns out to coincide with the 
largest solution of Eq. (30), that is with the magneti- 
zation in the ferromagnetic state or the backbone in a 
generic cluster. For p = 3 the function 771(7) is shown in 
Fig. H (upper curve) . 

After having fixed all the variables one could, one is 
left with the 'floppy' part of the hypergraph. The typical 
situation is sketched in Fig. [li], where the dashed part is 
frozen (hereafter we refer only to the p — 3 case for the 
sake of clarity) . All the interactions involving both frozen 
and floppy variables (those which form the boundary be- 
tween the frozen and the floppy part of the hypergraph) 
must contain 2 floppy and 1 frozen variables, otherwise 
(2 frozen and 1 floppy) that interaction would become 
frozen as well and would not longer be on the boundary. 

The numbers in Fig. [l4| have been assigned during a 
slightly different leaf removal process with the following 
rule. Starting with the original hypergraph, the number 
"1" is given to all the vertices of degree less than 2 (iso- 
lated vertices and leafs) and their hyperedges are deleted. 
Then, in the new hypergraph, the number "2" is given 
to all vertices of degree less than 2 and their hyperedges 
deleted. And so on. We call these numbers the depth 




FIG. 14: The bold tree-like structure is a possible seaweed 
(see text) in order to flip the variable on the circled vertex and 
still keep all the interactions satisfied. Note that the seaweed 
passes through at most 2 vertices on the same interaction. 



of a vertex: Vertices of depth 1 represent the 'external 
boundary' or the 'surface' of the hypergraph. 

The evolution of this "collective" leaf removal process 
can be described in terms of the same function fx (t) used 
previously. At each time step a depth is assigned to a 
fraction f\ (t) of vertices and then the time is increased 
by At = /1 (t) , in order to take into account the deletion 
of hyperedges leaving from the just numbered vertices. 
For very large times and depths, fi(t) is very small and 
can be approximated by fi(t) ~ (t — t*)dtfi(t*), where 
t* is such that f\(t*) = 0. In this regime we have that 



At = 



AA(t) 



hit) 

dfi(t) 



Of 



At 



dfi(t) 



dt 



hit) 



(Al) 
(A2) 



and so fx(t + At) = f^t) [1 + d t fi(t*)]. Then the prob- 
ability of having a (large) depth h satisfy the equation 
V(h+ 1) ~ V{h) /j, where 



/i(7) = 1 + 



dfi(t) 



dt 



= 1 



dfi(X) d\(t) 



dX dt 



(A3) 



A* (7) 



Since the probability of having depth h drops exponen- 
tially for large h as V{h) oc p, h , the largest depth assigned 
with this process is (log AT). For any 7 ^ 7d we have 
that fi(-f) < 1, since X(t) is a decreasing function of t and 
dxfi(X) is positive in the largest root A*, unless 7 = 74. 

Once depths have been assigned, there is an algorithm 
(described below) which allows one to change the value to 
any floppy variable, by adjusting, at the same time, only 
0(1) other floppy variables. Such a new configuration 
will be a finite distance far apart, and, by definition, will 
belong to the same cluster. In this way one can change 
the configuration of the floppy (and non-core) variables 
to any admissible one, and these configurations will form 
a unique cluster. 
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The physical idea behind the algorithm for flipping any 
floppy variable, keeping all the interactions satisfied, and 
adjusting only a finite number of other variables, is the 
following. Suppose, as in Fig. [l4[ that we flip the variable 
of depth 5. Then the interactions it participates to will be 
unsatisfied, and we have to move this 'excess energy' by 
flipping other variables, along the shortest way, towards 
the boundaries of the hypergraph, that is the vertices 
of depth 1, where it can be freely released. The only 
delicate point is the definition of the 'path to the bound- 
ary', which has to contain a finite number of vertices. In 
Fig. [lj] we show a possible way to release the excess en- 
ergy generated by flipping variable of depth 5: Flipping 
all the variables belonging the tree-like bold structure 
will keep all the interactions satisfied, since every inter- 
action contains an even number of vertices belonging to 
the bold structure. 

We will call this tree-like structure a seaweed, since 
it has a root, corresponding to the vertex of maximum 
depth, and the number of its branches grows approaching 
the surface. Now we give the rules for constructing a 
seaweed, such that its size is finite. 

Let us start with some nomenclature: We say that a 
hyperedge e is "below" a vertex v, and analogously v is 
"above" e, if the depth of v is the smallest among the 
depths of all the vertices in e. 

Thanks to the way depths have been assigned, each 
vertex may have at most 1 hyperedge below. This prop- 
erty can be easily proved, remembering that to any given 
vertex v the depth is assigned only when its connectivity 
is or 1. At this time, all the other hyperedges of v have 
been removed, since we have assigned smaller depths to 
its neighbours. The only hyperedge which can be below 
v is the last one. Moreover, if the depth is assigned to v 
when its connectivity is (isolated vertex), the vertex v 
will have no hyperedges below, and we will call it a root. 

The construction of the seaweed starts from the vertex 
corresponding to the variable that we want to flip (let 
us call it seed). In this way we are sure that such a 
vertex will be in the structure, and the corresponding 
variable flipped. The seaweed is built up recursively, that 
is we give the rules for growing a single branch, both 
upwards (i.e. towards the surface) and downwards (i.e. 
towards a root), and then these rules must be applied to 
any branch of the seaweed, until it reaches the surface 
of the hypergraph or a root vertex. The branches are 
such that along an upwards (downwards) direction the 
depth strictly decreases (increases). Rare exceptions to 
this property will be illustrated below. 

When a branch passes through a hyperedge it will visit 
only 2 vertices in this hyperedge, such that, when all the 
variables belonging to the seaweed will be flipped, the 
interaction will remain satisfied. 

Suppose the seed vertex has connectivity k. Then we 
start k different branches, 1 downwards entering the only 
hyperedge below the seed vertex and k — 1 upwards en- 
tering the other hyperedges. 

Any upwards branch entering a hyperedge e through 



vertex v has to be continued with the vertex above e. 
If there are many vertices of the same minimum depth 
in e, any of them can be chosen equivalently. With this 
rule we are ensuring that the new vertex added to the 
upwards branch is of smaller depth than v. 

Any downwards branch entering hyperedge e through 
vertex v has to be continued with the vertex of maximum 
depth in e. If there arc many vertices of the same maxi- 
mum depth in e, any of them can be chosen equivalently. 
With this rule we can ensure that the new vertex added 
to the downwards branch will be deeper than v, since v 
is of minimum depth in e. 

Any growing branch reaching a vertex v of connectiv- 
ity k has to be continued with k — 1 branches, in or- 
der to satisfy the rule that all the hyperedges of v must 
be visited by a branch. If the just reached vertex is on 
the surface (i.e. it has depth 1 and connectivity 1) the 
branch ends there. On the contrary, reaching a vertex of 
connectivity larger than 2, the growing branch generates 
new branches. More in particular, if the branch is an up- 
wards one it will generate only upwards branches (since it 
is coming from the only hyperedge below v). While, if it 
is the downwards one, it may generate at most one down- 
wards branch (all the rest being upwards ones). This is a 
consequence of the property that every vertex may have 
at most 1 hyperedge below it. 

In two cases the unique downwards branch ends in a 
vertex v, which is thus the root of the seaweed: (1) v 
is a root vertex, that is it has no hyperedges below it 
(2) vertex v is above hyperedge e, but v is not the only 
vertex of minimum depth in e. In this case the branch 
entering e through v becomes an upwards one, and makes 
a single step without decreasing the depth (this is the 
only exception to the rule on the monotonicity of the 
depth along a branch stated above). 

Since each branch of the seaweed is grown indepen- 
dently, it may be that a the end of the process some 
vertices result in more than one branch. This is not a 
problem: The rule says that every vertex which has been 
included an odd number of times in the seaweed must 
be in it; While those entering an even number of times 
must be left out. The net result is a decrease in the 
total number of vertices in the structure. The seaweed 
can eventually break up in more than a single connected 
component: All the components, but that containing the 
seed, can be removed from the seaweed. 

The choice of growing the branches always along ver- 
tices of maximum and minimum depths is dictated by 
the need of reaching a root vertex and the surface of the 
hypergraph as soon as possible, thus making the seaweed 
as small as possible. It is worth noticing that the proba- 
bility that a vertex is a root increases for larger depths. 

The last point to be proven is that the typical distance, 
£, measured along any branch, among the root of the 
seaweed and the surface, is finite (and not order log AT). 
This property together with the fact that the branching 
ratio is proportional to the connectivity, which is finite 
too, implies that the number of vertices in the seaweed, 
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which is roughly proportional to (37 — 1)^, is finite. On 
the contrary if I would be of order log N, the volume of 
the seaweed would diverge for large N. 

In order to show that I is finite, even when the root 
depth is as large as possible (i.e. order log TV), we need to 
know the probability that a vertex has depth h. This 
probability distribution function, V(h), can be calcu- 
lated exactly, but its expression is too involved to be 
presented here. We only report some features relevant 
for our purposes. It depends on the connectivity of the 
vertex, Vk(h), and for k = or k = 1 it is trivially 
given by Vo t \(h) = 5(h — 1). For any k > 2, it de- 
creases exponentially fast for large h, and the probabil- 
ity of reaching a vertex (not on the surface) of depth h 
is Q(h) = Ek>2 k fk(0)V k (h) oc n^f for large h. For 
the present calculation the exact shape of Q(h) at small 



depths is irrelevant, and we only care about its tail, so 
we can hereafter use Q(h) = fi h for all h. 

We show now that, with such a distribution of depths, 
even starting from a root of depth O (log TV), an upwards 
branch needs only a finite number of steps to reach the 
surface (for simplicity we fix to 0, instead of 1, the surface 
depth). The probability of going in a single step from 
depth h\ to depth /12 is 

w{hx -» hi) = ^7p4 r M /l2 , (A4) 

which has the correct normalization X^=o w (^i — * 
/12) = 1. The probability of going from depth h to depth 
in m steps is then 



h-l fri-l 



Wh(m) = ^2 •••X! w ( h ~^ hi)w{hi — > h 2 ) . . . w(h m -2 — » h m -i)w(h m -i 

h 1 =h 2 + l h 2 =h 3 + l h m -i=l 

1-H h (m-lY. ^ \\ l-Li h (m-lY. 

{1} 3 = 1 



-0) = 

h-l i 



m—1 



(A5) 



where the primed sum is over the m — 1 intermediate 
depths, taking different values between 1 and h — 1, and 
the inequality follows since in the last term we have in- 
cluded also configurations with indices taking equal val- 
ues. So Wh(m) is upper bounded by a Poissonian distri- 
bution with a mean number of steps 

i {h ) = {i-»y£-^- . (A6) 

i=l 1 M 



As expected, I is an increasing function of h. In the limit 
of a very deep root, h — > 00, the scries converges for any 
/j, < 1 (i.e. 7 > -fd), and thus £(00) is still finite. 
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